Goto

Collaborating Authors

 optimization method


Regularized Nonlinear Acceleration

Neural Information Processing Systems

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.


SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

Neural Information Processing Systems

SEBOOST applies a secondary optimization process in the subspace spanned by the last steps and descent directions. The method was inspired by the SESOP optimization method for large-scale problems, and has been adapted for the stochastic learning framework. It can be applied on top of any existing optimization method with no need to tweak the internal algorithm. We show that the method is able to boost the performance of different algorithms, and make them more robust to changes in their hyper-parameters. As the boosting steps of SEBOOST are applied between large sets of descent steps, the additional subspace optimization hardly increases the overall computational burden. We introduce two hyper-parameters that control the balance between the baseline method and the secondary optimization process. The method was evaluated on several deep learning tasks, demonstrating promising results.









GloballyConvergentNewtonMethodsfor Ill-conditionedGeneralizedSelf-concordantLosses

Neural Information Processing Systems

Second, in the non-parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nyström projection techniques, andprovethatitachievesoptimal generalization bounds with atime complexity of orderO(ndfλ), a memory complexity of orderO(df2λ) and no dependence on the condition number, generalizing the results known for leastsquaresregression.Here nisthenumberofobservationsand dfλ istheassociated degrees of freedom.